Efficient Algorithms for Minimizing Cross Validation Error
نویسندگان
چکیده
Model selection is important in many areas of supervised learning. Given a dataset and a set of models for predicting with that dataset, we must choose the model which is expected to best predict future data. In some situations, such as online learning for control of robots or factories, data is cheap and human expertise costly. Cross validation can then be a highly effective method for automatic model selection. Large scale cross validation search can, however, be computationally expensive. This paper introduces new algorithms to reduce the computational burden of such searches. We show how experimental design methods can achieve this, using a technique similar to a Bayesian version of Kaelbling’s Interval Estimation. Several improvements are then given, including (1) the use of blocking to quickly spot near-identical models, and (2) schemata search: a new method for quickly finding families of relevant features. Experiments are presented for robot data and noisy synthetic datasets. The new algorithms speed up computation without sacrificing reliability, and in some cases are more reliable than conventional techniques.
منابع مشابه
Determining optimal value of the shape parameter $c$ in RBF for unequal distances topographical points by Cross-Validation algorithm
Several radial basis function based methods contain a free shape parameter which has a crucial role in the accuracy of the methods. Performance evaluation of this parameter in different functions with various data has always been a topic of study. In the present paper, we consider studying the methods which determine an optimal value for the shape parameter in interpolations of radial basis ...
متن کاملStacked Generalization: An Introduction to Su- per Learning
Stacked generalization is an ensemble method that allows researchers to combine several different prediction algorithms into one. Since its introduction in the early 1990s, the method has evolved several times into what is now known as “Super Learner”. Super Learner uses V -fold cross-validation to build the optimal weighted combination of predictions from a library of candidate algorithms. Opt...
متن کاملEstimation of parameters of metal-oxide surge arrester models using Big Bang-Big Crunch and Hybrid Big Bang-Big Crunch algorithms
Metal oxide surge arrester accurate modeling and its parameter identification are very important for insulation coordination studies, arrester allocation and system reliability. Since quality and reliability of lightning performance studies can be improved with the more efficient representation of the arresters´ dynamic behavior. In this paper, Big Bang – Big Crunch and Hybrid Big Bang – Big Cr...
متن کاملBootstrapping the Out-of-sample Predictions for Efficient and Accurate Cross-Validation
Cross-Validation (CV), and out-of-sample performance-estimation protocols in general, are often employed both for (a) selecting the optimal combination of algorithms and values of hyper-parameters (called a configuration) for producing the final predictive model, and (b) estimating the predictive performance of the final model. However, the cross-validated performance of the best configuration ...
متن کاملStudy of Classification Algorithms using Moment Analysis
In this short paper we briefly discuss a moment based method that was recently introduced to study the behavior classification algorithms and model validation techniques for finite sample sizes. The method involves accurate and efficient computation of the moments of the generalization error which are over the space of all possible datasets of size N drawn from an underlying distribution. A cla...
متن کامل